Àá½Ã¸¸ ±â´Ù·Á ÁÖ¼¼¿ä. ·ÎµùÁßÀÔ´Ï´Ù.
KMID : 1155220220470030217
Journal of the Korean Society of Health Information and Health Statistics
2022 Volume.47 No. 3 p.217 ~ p.221
Performance Comparison of Imputation Methods Using Machine Learning Techniques for Ordinal Missing Data
Son Se-Rhim

An Hyung-Gin
Abstract
Objectives: When missing values occur, complete case analysis can cause biased results. In this paper, we discuss imputation methods using machine learning techniques when missing values occurred in ordinal variables.

Methods: We consider two machine learning techniques, the ordinal decision tree and the random forest, for the imputation of missing values. We use the ordinal decision tree treating variables as ordinal, and the random forest as nominal. In addition, we apply the cumulative logistic model. The results are compared with complete case analysis using empirical bias, empirical mean squared error and accuracy. The same methods are applied using the Korea National Health and Nutrition Examination Survey.

Results: In the case of five ordinal categories, machine learning techniques yield better performance than the cumulative logistic and complete case. The ordinal decision shows lower bias while random forest shows higher accuracy. In the case of 3 categories, random forest produces better performance in all respects. In the case study, biased results are also identified if we use complete case analysis. Random forest shows the best performance, and the parametric method shows similar performance to the ordinal decision tree.

Conclusions: Missing imputation using machine learning techniques can reduce bias and improve per- formance. If possible, it is recommended to use the ordinal decision tree to impute missing values that reflects the meaning of order. If it is not possible, it is recommended to treat them at least as nominal variables and then impute.
KEYWORD
Machine learning, Regression analysis, Decision tree, Big data, Health survey
FullTexts / Linksout information
Listed journal information
ÇмúÁøÈïÀç´Ü(KCI)